Goto

Collaborating Authors

 Minimum Complexity Machines


Causal Discovery from Event Sequences by Local Cause-Effect Attribution

Neural Information Processing Systems

Sequences of events, such as crashes in the stock market or outages in a network, contain strong temporal dependencies, whose understanding is crucial to react to and influence future events. In this paper, we study the problem of discovering the underlying causal structure from event sequences. To this end, we introduce a new causal model, where individual events of the cause trigger events of the effect with dynamic delays. We show that in contrast to existing methods based on Granger causality, our model is identifiable for both instant and delayed effects.We base our approach on the Algorithmic Markov Condition, by which we identify the true causal network as the one that minimizes the Kolmogorov complexity. As the Kolmogorov complexity is not computable, we instantiate our model using Minimum Description Length and show that the resulting score identifies the causal direction. To discover causal graphs, we introduce the Cascade algorithm, which adds edges in topological order. Extensive evaluation shows that Cascade outperforms existing methods in settings with instantaneous effects, noise, and multiple colliders, and discovers insightful causal graphs on real-world data.


Introducing Routing Uncertainty in Capsule Networks

Neural Information Processing Systems

Rather than performing inefficient local iterative routing between adjacent capsule layers, we propose an alternative global view based on representing the inherent uncertainty in part-object assignment. In our formulation, the local routing iterations are replaced with variational inference of part-object connections in a probabilistic capsule network, leading to a significant speedup without sacrificing performance. In this way, global context is also considered when routing capsules by introducing global latent variables that have direct influence on the objective function, and are updated discriminatively in accordance with the minimum description length (MDL) principle. We focus on enhancing capsule network properties, and perform a thorough evaluation on pose-aware tasks, observing improvements in performance over previous approaches whilst being more computationally efficient.


Counterfactual Basis Extension and Representational Geometry: An MDL-Constrained Model of Conceptual Growth

Amornbunchornvej, Chainarong

arXiv.org Machine Learning

Concept learning becomes possible only when existing representations fail to account for experience. Most models of learning and inference, however, presuppose a fixed representational basis within which belief updating occurs. In this paper, I address a prior question: under what structural conditions can the representational basis itself expand in a principled and selective way? I propose a geometric framework in which conceptual growth is modeled as admissible basis extension evaluated under a Minimum Description Length (MDL) criterion. Experience, whether externally observed or internally simulated, is represented as vectors relative to a current conceptual subspace. Residual components capture systematic representational failure, and candidate conceptual extensions are restricted to low-rank, admissible transformations. I show that any MDL-accepted extension can be chosen so that its novel directions lie entirely within the residual span induced by experience, while extensions orthogonal to this span strictly increase description length and are therefore rejected. This yields a conservative account of imagination and conceptual innovation. Internally generated counterfactual representations contribute to learning only insofar as they expose or amplify structured residual error, and cannot introduce arbitrary novelty. I further distinguish representational counterfactuals--counterfactuals over an agent's conceptual basis--from causal or value-level counterfactuals, and show how MDL provides a normative selection principle governing representational change. Overall, the framework characterizes conceptual development as an error-driven, geometry-constrained process of basis extension, clarifying both the role and the limits of imagination in learning and theory change.


Softly Symbolifying Kolmogorov-Arnold Networks

Bagrow, James, Bongard, Josh

arXiv.org Machine Learning

Kolmogorov-Arnold Networks (KANs) offer a promising path toward interpretable machine learning: their learnable activations can be studied individually, while collectively fitting complex data accurately. In practice, however, trained activations often lack symbolic fidelity, learning pathological decompositions with no meaningful correspondence to interpretable forms. We propose Softly Symbolified Kolmogorov-Arnold Networks (S2KAN), which integrate symbolic primitives directly into training. Each activation draws from a dictionary of symbolic and dense terms, with learnable gates that sparsify the representation. Crucially, this sparsification is differentiable, enabling end-to-end optimization, and is guided by a principled Minimum Description Length objective. When symbolic terms suffice, S2KAN discovers interpretable forms; when they do not, it gracefully degrades to dense splines. We demonstrate competitive or superior accuracy with substantially smaller models across symbolic benchmarks, dynamical systems forecasting, and real-world prediction tasks, and observe evidence of emergent self-sparsification even without regularization pressure.


drawing connections to Feldman's work (L36), but we agree that the relation between the three topics should be

Neural Information Processing Systems

Thank you all for your thoughtful comments; we address your concerns below. The MDL principle formalizes Occam's razor and is a We will add the discussion of such relevant studies to section 1. We will add these results and accompanying visualizations to appendix. Model (solver) MAC DAFT MAC (euler) DAFT MAC (rk4) DAFT MAC (dopri5; used in training)Time (ms) 153. We found that during evaluation, rk4 solves all the dynamics generated from CLEVR dataset.



Compressing Chemistry Reveals Functional Groups

Sharma, Ruben, King, Ross D.

arXiv.org Artificial Intelligence

We introduce the first formal large-scale assessment of the utility of traditional chemical functional groups as used in chemical explanations. Our assessment employs a fundamental principle from computational learning theory: a good explanation of data should also compress the data. We introduce an unsupervised learning algorithm based on the Minimum Message Length (MML) principle that searches for substructures that compress around three million biologically relevant molecules. We demonstrate that the discovered substructures contain most human-curated functional groups as well as novel larger patterns with more specific functions. We also run our algorithm on 24 specific bioactivity prediction datasets to discover dataset-specific functional groups. Fingerprints constructed from dataset-specific functional groups are shown to significantly outperform other fingerprint representations, including the MACCS and Morgan fingerprint, when training ridge regression models on bioactivity regression tasks.


The Limits of AI Explainability: An Algorithmic Information Theory Approach

Rao, Shrisha

arXiv.org Artificial Intelligence

This paper establishes a theoretical foundation for understanding the fundamental limits of AI explainability through algorithmic information theory. We formalize explainability as the approximation of complex models by simpler ones, quantifying both approximation error and explanation complexity using Kolmogorov complexity. Our key theoretical contributions include: (1) a complexity gap theorem proving that any explanation significantly simpler than the original model must differ from it on some inputs; (2) precise bounds showing that explanation complexity grows exponentially with input dimension but polynomially with error tolerance for Lipschitz functions; and (3) a characterization of the gap between local and global explainability, demonstrating that local explanations can be significantly simpler while maintaining accuracy in relevant regions. We further establish a regulatory impossibility theorem proving that no governance framework can simultaneously pursue unrestricted AI capabilities, human-interpretable explanations, and negligible error. These results highlight considerations likely to be relevant to the design, evaluation, and oversight of explainable AI systems.


The Information-Theoretic Imperative: Compression and the Epistemic Foundations of Intelligence

Dittrich, Christian, Kinne, Jennifer Flygare

arXiv.org Artificial Intelligence

Existing frameworks converge on the centrality of compression to intelligence but leave underspecified why this process enforces the discovery of causal structure rather than superficial statistical patterns. We introduce a two-level framework to address this gap. The Information-Theoretic Imperative (ITI) establishes that any system persisting in uncertain environments must minimize epistemic entropy through predictive compression: this is the evolutionary "why" linking survival pressure to information-processing demands. The Compression Efficiency Principle (CEP) specifies how efficient compression mechanically selects for generative, causal models through exception-accumulation dynamics, making reality alignment a consequence rather than a contingent achievement. Together, ITI and CEP define a causal chain: from survival pressure to prediction necessity, compression requirement, efficiency optimization, generative structure discovery, and ultimately reality alignment. Each link follows from physical, information-theoretic, or evolutionary constraints, implying that intelligence is the mechanically necessary outcome of persistence in structured environments. This framework yields empirically testable predictions: compression efficiency, measured as approach to the rate-distortion frontier, correlates with out-of-distribution generalization; exception-accumulation rates differentiate causal from correlational models; hierarchical systems exhibit increasing efficiency across abstraction layers; and biological systems demonstrate metabolic costs that track representational complexity. ITI and CEP thereby provide a unified account of convergence across biological, artificial, and multi-scale systems, addressing the epistemic and functional dimensions of intelligence without invoking assumptions about consciousness or subjective experience.


Symbolic Snapshot Ensembles

Liu, Mingyue, Cropper, Andrew

arXiv.org Artificial Intelligence

Inductive logic programming (ILP) is a form of logical machine learning. Most ILP algorithms learn a single hypothesis from a single training run. Ensemble methods train an ILP algorithm multiple times to learn multiple hypotheses. In this paper, we train an ILP algorithm only once and save intermediate hypotheses. We then combine the hypotheses using a minimum description length weighting scheme. Our experiments on multiple benchmarks, including game playing and visual reasoning, show that our approach improves predictive accuracy by 4% with less than 1% computational overhead.